Skip to content

feat: lazy DFA cache (R1+R2) over OPTIMIZED_NFA for large anchor-free patterns#67

Merged
jbachorik merged 29 commits into
mainfrom
feat/lazy-dfa-r1-r2
May 29, 2026
Merged

feat: lazy DFA cache (R1+R2) over OPTIMIZED_NFA for large anchor-free patterns#67
jbachorik merged 29 commits into
mainfrom
feat/lazy-dfa-r1-r2

Conversation

@jbachorik
Copy link
Copy Markdown
Collaborator

@jbachorik jbachorik commented May 29, 2026

What does this PR do?

Adds a lazily-materialized DFA cache (LazyDFACache) over NFA execution for patterns with ≥300 NFA states, no anchors, and no capturing groups. On the warm path, matching cost drops from O(NFA-states) per character to a single int[128] array read.

Motivation

OPTIMIZED_NFA patterns recompute closure(stateSet, c) on every character of every matches() call. For large NFA patterns this is expensive. Recommendations R1 (lazy DFA state interning) and R2 (per-state ASCII transition table) from doc/plans/glob-perf-nfa-improvements.md address this. The glob_perf benchmark shows lazydfa is the consistent winner over plain NFA simulation once the cache is warm.

Related Issue(s)

Implements R1 + R2 from doc/plans/glob-perf-nfa-improvements.md.

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring (no functional change)
  • Documentation
  • Test improvement
  • Build/CI change

Checklist

  • I have read the CONTRIBUTING.md guidelines
  • All existing tests pass (./gradlew build)
  • I have added tests for my changes
  • I have updated documentation (if applicable)
  • My commits are signed

Performance Impact

New LazyDFABenchmark with three variants measures the full performance envelope:

  • hitPath — warm cache, all DFA transitions cached → single int[128] read per char
  • missPath — cold cache, fresh diverse inputs → nfaStep + interning overhead
  • frozenPath — cache at 4096-state cap, all new transitions fall back to NFA

Baseline comparison: same patterns via NFAFallbackBenchmark.

Additional Notes

Design decisions:

  • Cache cap is fixed at 4096 DFA states. When the cap is reached the cache freezes; new transitions fall back to plain NFA stepping. This matches glob_perf's documented approach to prevent the 300× hit-path regression observed with unbounded caches at 65k patterns.
  • Generated class implements NfaStep directly (public apply bridge → package-private nfaStep) instead of LambdaMetafactory INVOKEDYNAMIC — hidden classes defined via defineHiddenClass cannot name themselves in lambda bootstrap descriptors.
  • PatternAnalyzer short-circuits DFA construction entirely for LAZY_DFA-eligible patterns (saves SubsetConstructor cost).
  • Thread-safety: ConcurrentHashMap.computeIfAbsent for state interning; VarHandle.storeStoreFence() before publishing int[128] ASCII table references to prevent JIT reordering on weakly-ordered architectures (stale null reads safely fall back to lookupOrCompute).

🤖 Generated with Claude Code

@jbachorik jbachorik added the AI Generated or assisted by AI label May 29, 2026
@datadog-prod-us1-6

This comment has been minimized.

@jbachorik
Copy link
Copy Markdown
Collaborator Author

jbachorik commented May 29, 2026

Benchmark results — 3-fork baseline run

Environment: JDK 21.0.10 (Zulu), 3 forks × 10 measurement iterations, (?:[a-z][0-9]){200} on 400-char input, macOS ARM.

Benchmark ops/ms ±99% CI Notes
jdkMissBaseline 36,187 ±484 JDK diverse inputs — early-exit on mismatch
frozenPath 2,715 ±67 Frozen cache — also early-exits on random inputs
missPath 2,665 ±32 LAZY_DFA cold path (interning overhead)
jdkHitBaseline 1,335 ±17 JDK fixed 400-char matching input
hitPath 987 ±78 LAZY_DFA warm path (int[128] read per char)

hitPath is 26% slower than JDK — root-cause candidates

The warm path should theoretically win on repeated identical input (one array read per character vs JDK's NFA simulation), but it doesn't. Likely overhead sources:

  1. Virtual dispatch through LazyDFACache.matches(NfaStep) — even though lambda allocation is avoided (generated class implements NfaStep directly), every matches() call crosses a non-inlinable call frame into LazyDFACache.
  2. (int[]) asciiTables[dfaState] checkcast per character — the JIT may not eliminate this on the hot path.
  3. JIT variance across forkshitPath has ±78 CI vs ±17 for JDK, suggesting JIT instability; the true gap could be narrower.

frozenPath and jdkMissBaseline comparisons are misleading

Both are dominated by early-exit on random non-matching inputs. The frozen-path benchmark needs to use MATCH_INPUT (always-matches) to give a fair frozen-vs-JDK number.

Follow-up work (not blocking this PR)

  • Profile hitPath with async-profiler (-prof async) to identify whether overhead is in the checkcast, virtual dispatch, or call-frame depth
  • Consider inlining LazyDFACache.matches() into the generated matches() — the warm path is ~15 instructions; inlining eliminates the interface dispatch and checkcast
  • Fix FrozenState to use MATCH_INPUT (always-matches) so frozenPath measures full-traversal NFA fallback, not early-reject
  • Re-run after inlining to confirm whether hitPath beats JDK on fully-warm input

🤖 Generated with Claude Code

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new LAZY_DFA strategy that lazily caches DFA-style transitions over NFA execution for large, group-free, anchor-free patterns to improve warm-path matching performance.

Changes:

  • Adds LazyDFACache, NfaStep, and StateSetKey runtime support.
  • Adds LazyDFABytecodeGenerator and routes qualifying patterns via PatternAnalyzer/RuntimeCompiler.
  • Adds unit tests, a JMH benchmark, and a design spec for the lazy DFA cache.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/LazyDFACache.java Implements lazy DFA state interning, ASCII transition tables, freeze/fallback behavior.
reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/NfaStep.java Defines the generated NFA step interface used by the cache.
reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/StateSetKey.java Adds content-based keys for cached NFA state sets.
reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/RuntimeCompiler.java Wires LAZY_DFA generation into runtime bytecode compilation.
reggie-codegen/src/main/java/com/datadoghq/reggie/codegen/codegen/LazyDFABytecodeGenerator.java Emits lazy-DFA-specific bytecode, NFA tables, and bounded/full match methods.
reggie-codegen/src/main/java/com/datadoghq/reggie/codegen/analysis/PatternAnalyzer.java Adds LAZY_DFA strategy selection and capturing-group detection.
reggie-runtime/src/test/java/com/datadoghq/reggie/runtime/LazyDFACacheTest.java Tests cache behavior, freeze/fallback, non-ASCII, and concurrency basics.
reggie-runtime/src/test/java/com/datadoghq/reggie/runtime/StateSetKeyTest.java Tests state-set key equality and hash behavior.
reggie-runtime/src/test/java/com/datadoghq/reggie/runtime/LazyDFABytecodeGeneratorTest.java Tests generated lazy-DFA matcher behavior and cache sharing.
reggie-codegen/src/test/java/com/datadoghq/reggie/codegen/analysis/PatternAnalyzerLazyDFATest.java Tests analyzer routing into and away from LAZY_DFA.
reggie-benchmark/src/main/java/com/datadoghq/reggie/benchmark/LazyDFABenchmark.java Adds JMH benchmarks for hit, miss, and frozen paths.
docs/superpowers/specs/2026-05-28-lazy-dfa-design.md Documents the lazy DFA design, cache policy, and test plan.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/LazyDFACache.java Outdated
Comment thread reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/LazyDFACache.java Outdated
Comment thread docs/superpowers/specs/2026-05-28-lazy-dfa-design.md
@jbachorik
Copy link
Copy Markdown
Collaborator Author

Follow-up: Object[]→int[][] fix — benchmark update

Profiling (JMH -prof stack + -prof gc) identified the hot-path bottleneck: (int[]) asciiTables[dfaState] emitted a CHECKCAST [I on every character (400× per call on a 400-char input). Fix: change asciiTables field type from Object[] to int[][] — four-line change, no logic change, VarHandle.storeStoreFence() fence still valid.

Before vs After — 3 forks × 10 iterations, JDK 21, (?:[a-z][0-9]){200} / 400-char input

Benchmark Before (ops/ms) After (ops/ms) Δ
hitPath (LAZY_DFA warm) 987 ±78 2243 ±174 +127%
jdkHitBaseline 1335 ±17 1334 ±19 flat

LAZY_DFA warm path now beats JDK by +68%.

Commit: 0bb1b4d

🤖 Generated with Claude Code

@jbachorik jbachorik marked this pull request as ready for review May 29, 2026 10:37
@jbachorik jbachorik requested a review from Copilot May 29, 2026 10:38
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e2b8e2a1c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Comment thread reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/LazyDFACache.java Outdated
@jbachorik jbachorik force-pushed the feat/lazy-dfa-r1-r2 branch from 73c81d2 to 17d6e13 Compare May 29, 2026 11:36
jbachorik and others added 15 commits May 29, 2026 15:23
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…iler

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…imization

After rebasing onto jb/logs-backend, patterns like (?:[a-z][0-9]){200} that
were previously routed to LAZY_DFA now route to DFA_TABLE (the new table-driven
DFA backend fits these in under the 1 MB budget). LAZY_DFA is now triggered only
for patterns where DFA construction hits the 10 000-state explosion limit.

Switch test/benchmark patterns to (?:a+b+|b+a+){75} which genuinely causes
DFA state explosion while keeping the NFA small enough to avoid method-too-large
in the NFA delegate methods. Also removes the early-out shortcut in PatternAnalyzer
(it incorrectly blocked DFA_TABLE routing for patterns that fit in the table).
@jbachorik jbachorik force-pushed the feat/lazy-dfa-r1-r2 branch from c86528d to e9ece84 Compare May 29, 2026 13:23
@jbachorik
Copy link
Copy Markdown
Collaborator Author

Rebased onto main (post-logs-backend merge)

Rebased 15 commits cleanly onto main (5f5706c) using --onto to drop the jb/logs-backend history and replay only the LAZY_DFA commits. All tests pass.

Updated benchmark results (post-rebase, (?:a+b+|b+a+){75}, JDK 21)

Benchmark ops/ms ±error Notes
hitPath (LAZY_DFA warm) 2212 ±264 +90% vs JDK
hardMissPath (LAZY_DFA, late-failing all-[ab] input) 2148 ±182 +182% vs JDK
jdkHitBaseline 1163 ±151
jdkHardMissBaseline 762 ±59
frozenPath (NFA fallback after cache freeze) 1979 ±224 full traversal

hardMissPath vs jdkHardMissBaseline is the fair miss comparison: all-[ab] inputs that fail after 60–74 complete groups, forcing real NFA/DFA traversal on both engines. The 2.8× gap closes to ~2× on hit, confirming LAZY_DFA earns its keep on non-trivial inputs.

Note on deterministic patterns: (?:[a-z][0-9]){200} and similar deterministic patterns now route to DFA_TABLE (the table-driven DFA backend added in this merge) rather than LAZY_DFA. LAZY_DFA specifically targets patterns where DFA construction hits the 10 000-state explosion limit.

🤖 Generated with Claude Code

@jbachorik
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: db6a8974e0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/LazyDFACache.java Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Comment thread docs/superpowers/specs/2026-05-28-lazy-dfa-design.md Outdated
…rage, spec doc

- LazyDFACache: add INT_ARRAY_VH for int[]; use setRelease/getAcquire on existing-table writes (item 3325667007)
- LazyDFABytecodeGenerator: replace IALOAD with VarHandle getAcquire in inlined hot loop (item 3325667007)
- LazyDFABenchmark FrozenState: change warm-up alphabet to "ab" so cache actually fills (item 3325673306)
- LazyDFABytecodeGeneratorTest: add match/matchBounded/findMatchFrom coverage (item 3325673394)
- ReggieMatcherBytecodeGeneratorTest: add LAZY_DFA processor end-to-end test (item 3325673350)
- docs: fix "c & 0x7F" → c < 128 guard description in lazy-dfa-design.md (item 3325673423)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@jbachorik jbachorik requested a review from Copilot May 29, 2026 19:38
@jbachorik
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e514183ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .sphinx/address/coder-plan.md Outdated
@jbachorik
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecde02a95c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Comment thread docs/superpowers/specs/2026-05-28-lazy-dfa-design.md Outdated
@jbachorik
Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated no new comments.

@jbachorik jbachorik merged commit a63fe8d into main May 29, 2026
9 checks passed
@jbachorik jbachorik deleted the feat/lazy-dfa-r1-r2 branch May 29, 2026 21:57
@jbachorik jbachorik added this to the 0.3.0 milestone May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated or assisted by AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants